Deployment fails with Concurrent::RejectedExecutionError #168

Raniz85 · 2018-06-04T09:51:08Z

We've seen similar failures on multiple servers, they seem to happen randomly.

Here's the log message:

2018-05-31 15:31:07 INFO  [codedeploy-agent(8052)]: [Aws::CodeDeployCommand::Client 200 0.030218 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"Concurrent::RejectedExecutionError\",\"log\":\"\"}"},host_command_identifier:"<redacted>")

The redacted base64 string decodes into:

["com.amazon.apollo.deploycontrol.domain.HostCommandIdentifier",{"deploymentId":"CodeDeploy/eu-west-1/Prod/arn:aws:sds:eu-west-1:<accountId>:deployment/<deploymentId>","hostId":"arn:aws:ec2:eu-west-1:<accountId>:instance/<instanceId>","commandName":"AfterAllowTraffic","commandPosition":13,"commandAttempt":1}]

The text was updated successfully, but these errors were encountered:

rohkat-aws · 2018-06-04T17:06:15Z

@Raniz85 we have a fix for this bug in the next release. But there are solutions to this. Can you tell me a little more about the time its happening and may be paste some agent logs ,before this happens .

Raniz85 · 2018-06-05T08:01:48Z

Here are the logs from the instance: cd-agent-crash.log

Note that we restart the agent 2 minutes after each deployment as a workaround to #32.

rohkat-aws · 2018-06-06T20:38:23Z

@Raniz85 so here is my hypothesis, When the agent restarts and starts exiting, it should not accept any new poll requests,but it does ,because of a bug in the thread synchronization of the agent, for which the fix has been added. ef65652 . But will be released in the next release. But the fix for this for now, is that if you can wait for the agent to restart properly , may be add a wait and then start a deployment. I know its not ideal , but this can work as a temporary workaround.

Raniz85 · 2018-06-07T07:29:23Z

I'm not sure I understand the proposed workaround.

Should we ensure that we're never deploying when an agent is restarting? That's not an easy workaround in an environment with more than one or two servers.

When can we expect the next release?

rohkat-aws · 2018-06-07T16:40:42Z

@Raniz85 I totally understand that, and its happening due to a race condition. But another workaround could be you just add a wait after the restart like a sleep for some seconds and then start deploying, we are still working out some issues for next release, but will post once we start releasing the next version.

Raniz85 · 2018-06-08T08:16:22Z

I still don't understand what you want me to do.

We have the agent running on about 100 instances and deployments are automated via Jenkins. Do you suggest what we add a hook in Jenkins that ensures that no agent has restarted on any of the 100 instances within the last 10 seconds (or so) before starting the deployment?

rohkat-aws · 2018-06-12T21:43:17Z

@Raniz85 which region are the hosts in ?

Raniz85 · 2018-06-18T12:24:58Z

eu-west-1 mostly though some are in cn-north-1 and us-east-2.

rohkat-aws · 2018-06-18T16:55:18Z

so in eu-west-1 we have released the new version of the agent , which should fix this issue. Can you try that. 1518

SupportNubersia · 2018-06-19T12:35:43Z

Hi @rohkat-aws we've the same problem with the agent code-deploy

We use the region eu-west-1 and we've to tried update agent(agent_version: OFFICIAL_1.0-1.1518_rpm) to the new version and we got these message:

sudo /opt/codedeploy-agent/bin/install auto

I, [2018-06-19T12:26:13.508928 #25330]  INFO -- : Starting Ruby version check.
I, [2018-06-19T12:26:13.509033 #25330]  INFO -- : Starting update check.
I, [2018-06-19T12:26:13.509064 #25330]  INFO -- : Attempting to automatically detect supported package manager type for system...
I, [2018-06-19T12:26:13.517030 #25330]  INFO -- : Checking AWS_REGION environment variable for region information...
I, [2018-06-19T12:26:13.517100 #25330]  INFO -- : Checking EC2 metadata service for region information...
I, [2018-06-19T12:26:13.550405 #25330]  INFO -- : Downloading version file from bucket aws-codedeploy-eu-west-1 and key latest/VERSION...
I, [2018-06-19T12:26:13.571773 #25330]  INFO -- : Running version matches target version, skipping install
I, [2018-06-19T12:26:13.571839 #25330]  INFO -- : Update check complete.
I, [2018-06-19T12:26:13.571858 #25330]  INFO -- : Stopping updater.

Can you confirm that the new version is released?

Regards,

SupportNubersia · 2018-06-19T13:13:36Z

Hi @rohkat-aws when you say
"could be you just add a wait after the restart like a sleep for some seconds and then start deploying"
what do you mean?
could be :wait_between_runs: variable for codedeployagent.yml?
any suggestions?
Greetings and Thanks.

woodhull · 2018-06-19T14:46:45Z

We're seeing this issue as well. Has the fixed been released to the us-east-1 codedeploy install s3 bucket?

I do not understand the proposed workaround from @rohkat-aws either.

rohkat-aws · 2018-06-19T14:49:39Z

@SupportNubersia @woodhull 1.0-1.1518_rpm is the new version. Are you still seeing issues ?

woodhull · 2018-06-19T14:53:09Z

We've so far only seen this issue on fresh instance boot, so we're rebaking our AMIs in the hope that we get a fresher version of the codedeploy agent. We'll let you know one that is complete, we roll the AMI out, and can test.

SupportNubersia · 2018-06-19T15:07:18Z

Hi @rohkat-aws

The version agent_version: OFFICIAL_1.0-1.1518_rpm that exists in my AMIs was created more than a month ago.

Can you confirm that the new release of code-agent is 1.0-1.1518?

SupportNubersia · 2018-06-19T15:08:34Z

To install the codedeploy-agent we used the official process.

https://docs.aws.amazon.com/codedeploy/latest/userguide/codedeploy-agent-operations-install-linux.html

rohkat-aws · 2018-06-19T17:06:46Z

@SupportNubersia @woodhull yes the OFFICIAL_1.0-1.1518_rpm should fix this.
And @woodhull the fix that was suggested was for 1458 or the previous agent version and All i meant in the workaround in was that, if we can wait after the agent restarts completely or wait for it to start up again ,before sending it a deployment request.

SupportNubersia · 2018-06-20T07:11:46Z

Hi @rohkat-aws
How we can do to wait after agent restarts completely or wait ffor it to start up again, before sending it a deployment request?
The problem occurs when the autoscaling group launch a new instances.

Please we need to solve it ASAP.

rohkat-aws · 2018-06-20T07:21:38Z

@SupportNubersia Are you seeing the issue even with the latest version of the agent ? is it possible the Ami is pre-baked with an old version of the agent. And when you say it happens during Scale up, is it because the agent is being restarted in the launch config after the install?

SupportNubersia · 2018-06-20T07:28:06Z

@rohkat-aws the last version that our AMI has prepared a month ago is the version: OFFICIAL_1.0-1.1518_rpm and we get the same error.
What we see is that the system executes the whole process until the execution of the StartApplication hook (other times it works correctly).
Do you advise us to restart the code-deploy agent in the LaunchConfiguration?
We are going to eliminate the installation of the code-deploy agent and install it again.

rohkat-aws · 2018-06-20T07:43:30Z

the 1518 version was released a week back. Not a month back @SupportNubersia. Having said that, can you please look into /tmp/codedeploy-agent.update.log and confirm. If it's not being updated.

SupportNubersia · 2018-06-20T08:06:52Z

Hi @rohkat-aws we can see that the error is that when the instances launch in fisrt time it execute a update and kill our deployment.

Now we're updating the AMI with latest codedeploy-agent version and we will try again.

How we can disabled this automatic update in first execution?

rohkat-aws · 2018-06-20T14:47:24Z

@SupportNubersia Did that work?

woodhull · 2018-06-20T14:57:06Z

We think that this is now fixed for us after baking a new AMI with the latest codedeploy agent version preinstalled.

SupportNubersia · 2018-06-20T15:21:53Z

@rohkat-aws its working good.

But in the next code-deploy update, The problem will occur aggain...

rohkat-aws · 2018-06-20T15:35:49Z

no it should not @SupportNubersia this version fixes that. and this is the commit which does that.
ef65652#diff-9c9dfb7af94f7715489974ad6d37d7f3R76

rohkat-aws · 2018-06-20T15:39:45Z

@Raniz85 if you can also confirm, I can close the issue.

SupportNubersia · 2018-06-20T15:40:09Z

Ok @rohkat-aws perfect!

Thanks for your help

Raniz85 · 2018-06-20T16:03:51Z

We're preparing to upgrade our AMIs, but haven't gotten there yet. I probably won't have time tomorrow and then I'm on vacation next week, so I'll get back to you after that

pags · 2018-06-26T12:56:24Z

Updating our AMIs with the newest CodeDeploy agent fixed the issue for us.

rohkat-aws · 2018-06-26T20:04:59Z

@Raniz85 i think we can re open this if you still have issues , Closing this for now

jgerry · 2018-06-28T22:37:57Z

I'm testing the new agents now and it seems to be working, but my issue is mostly that something changed internally on the AWS side that caused this. I bake AMIs for some applications specifically to not get new versions of the agents frequently. I have apps that are using agent versions from 4 months ago that suddenly started having problems.

petervandoros · 2018-07-19T00:43:50Z

@jgerry We delete the crontab for the autoupdate feature when baking the AMI. A couple of our systems weren't doing this and started seeing this error when the agent was updating itself to the latest version mid-deploy. I.e., the old version was failing to shutdown mid-deploy triggered by the autoupdate feature.

mysteriouskangaroo · 2018-08-21T15:47:53Z

@rohkat-aws unfortunately, we too are running into this issue about half of the time with a simple scale-up from one to two servers. We have the latest OFFICIAL_1.0-1.1518_rpm on us-east-1. On autoscale, we push out two repos concurrently that are relatively small. They do, however, call out to composer to pull in external packages.

Our solution likely for now is going to be to just bake everything into AMIs, which is cumbersome :/. Is there any ETA on a fix?

rohkat-aws · 2018-08-21T23:31:59Z

@falcor781 is an update happening or the agent that was used was already 1518 . Can you pls also check your update logs

mysteriouskangaroo · 2018-08-22T03:37:59Z

The agent that was used was already 1518 starting from the AMI. I'm not sure which 'update logs' you are referring to as it is starting from an AMI snapshot; nothing should be getting updated.

rohkat-aws · 2018-08-22T07:48:56Z

@falcor781 just to confirm you are getting Concurrent::RejectedExecutionError

rohkat-aws added the bug label Jun 4, 2018

rohkat-aws closed this as completed Jun 26, 2018

Deployment fails with Concurrent::RejectedExecutionError #168

Deployment fails with Concurrent::RejectedExecutionError #168

Comments

Raniz85 commented Jun 4, 2018

rohkat-aws commented Jun 4, 2018

Raniz85 commented Jun 5, 2018

rohkat-aws commented Jun 6, 2018

Raniz85 commented Jun 7, 2018 • edited

rohkat-aws commented Jun 7, 2018

Raniz85 commented Jun 8, 2018

rohkat-aws commented Jun 12, 2018

Raniz85 commented Jun 18, 2018 • edited

rohkat-aws commented Jun 18, 2018

SupportNubersia commented Jun 19, 2018 • edited

SupportNubersia commented Jun 19, 2018 • edited

woodhull commented Jun 19, 2018

rohkat-aws commented Jun 19, 2018

woodhull commented Jun 19, 2018 • edited

SupportNubersia commented Jun 19, 2018

SupportNubersia commented Jun 19, 2018

rohkat-aws commented Jun 19, 2018

SupportNubersia commented Jun 20, 2018

rohkat-aws commented Jun 20, 2018

SupportNubersia commented Jun 20, 2018

rohkat-aws commented Jun 20, 2018 • edited

SupportNubersia commented Jun 20, 2018

rohkat-aws commented Jun 20, 2018

woodhull commented Jun 20, 2018

SupportNubersia commented Jun 20, 2018

rohkat-aws commented Jun 20, 2018

rohkat-aws commented Jun 20, 2018

SupportNubersia commented Jun 20, 2018

Raniz85 commented Jun 20, 2018

pags commented Jun 26, 2018

rohkat-aws commented Jun 26, 2018

jgerry commented Jun 28, 2018

petervandoros commented Jul 19, 2018

mysteriouskangaroo commented Aug 21, 2018 • edited

rohkat-aws commented Aug 21, 2018

mysteriouskangaroo commented Aug 22, 2018

rohkat-aws commented Aug 22, 2018

Raniz85 commented Jun 7, 2018 •

edited

Raniz85 commented Jun 18, 2018 •

edited

SupportNubersia commented Jun 19, 2018 •

edited

SupportNubersia commented Jun 19, 2018 •

edited

woodhull commented Jun 19, 2018 •

edited

rohkat-aws commented Jun 20, 2018 •

edited

mysteriouskangaroo commented Aug 21, 2018 •

edited